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BACTERIAL YGJD POLYPEPTIDE FAMILY 



This invention relates to a family of bacterial polypeptides which are required for 
growth of both gram negative and gram positive bacteria, the genes which encode 
5 them and the use of such polypeptides and genes as tools for identifying novel broad 
spectrum antibiotics. 

New antibiotics are urgently needed in current medical practice as both serious 
bacterial infections and multiply antibiotic resistant strains are becoming 

10 increasingly prevalent (Proc. Natl. Acad. Sci USA (1994) 91 :2420-2427; New 
England J. Med. (1994) 330:1247-1251). The increase in number of serious 
infections has been ascribed to a variety of causes, including: 1) Increasing age of 
the general population, 2) increasingly long and complex surgeries and 3) a growing 
immuno-suppressed population associated with cancer therapies, organ transplants 

1 5 and HIV infection. Overuse of antibiotics in both medical and agricultural settings, 
improper sanitation and a general lack of concern about antibiotic resistant 
organisms have all contributed to the increasing frequency of multiply antibiotic 
resistant bacteria. Taken together, these two trends suggest that we will soon be 
faced with bacterial infections which are resistant to all therapies. Indeed, the first 

20 report of vancomycin-resistant S. aureus has just been published (Lancet (1 997) 
350:1670-1673). 

Identification of conserved essential proteins is a key step in the development of 
broad-spectrum antibiotics. If a target protein is conserved across taxonomic lines, 

25 the possibility that antibiotics acting on that protein will be effective on a wide range 
of bacteria is maximized. As examples, DNA gyrase and RNA polymerase are 
found in all bacteria, which helps to explain why quinolones and rifampicin are good 
broad-spectrum antibiotics. However, not all bacteria synthesize peptidoglycan, 
which explains why b-lactam antibiotics are ineffective against Chlamydia, 

30 Rickettesia and Legionella species. The recent publication of several complete 

eubacterial genomic sequences (Science (1995) 270:397-403; Science (1997) 277: 
1453-1474; Nature (1997) 390:249-256) allows the identification of bacterial 
proteins which have orthologues in all of the sequenced genomes. This approach 
has lead to the identification of many conserved protein families (Science (1997) 

35 278:63 1-637). In some cases a biochemical function for the conserved family may 
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be deduced from their predicted amino acid sequence. In other cases no function can 
be predicted for the protein family. However, it is impossible to predict the 
physiological role of a protein or protein family without detailed characterisation of 
at least one family member. 

Following identification of a conserved bacterial protein family, the protein must be 
shown to be essential for bacterial viability if it is to serve as an antibiotic target. 
Genetic systems have been developed to demonstrate a genes essentiality in both E. 
coli (J. Bacteriol. (1997) 179:6228-6237) and B. subtilis (Genes Dev. (1991) 
177:4194-4197). In some instances these systems suffer either from a reliance on 
negative data, failure to disrupt a given gene, or insufficient repression of the 
candidate gene, which can lead to misidentification of genes essentiality. Clean data 
from taxonomically diverse bacteria, such as gram negative and gram positive 
strains offers the best evidence that a conserved bacterial protein family is essential 
for viability and will make a good broad-spectrum antibiotic target. 

We have identified a family of conserved bacterial genes which we have designated 
the ygjD gene family, after the name given to the E. coli gene family member. 
These genes have not been previously isolated nor the polypeptides expressed as no 
function has been ascribed to these genes. It has now been discovered that this 
family of genes encodes a family of polypeptides which are essential for the survival 
their host bacteria. 

The invention therefore provides an isolated polypeptide of the ygjD family as 
defined below particularly for use in the identification of novel antibiotic agents. 
The polypeptides of the present invention are believed to be essential to the viability 
of a wide range of bacteria including both gram positive and gram negative bacteria. 

Any one of the following three methods may be used to identify members of the 
ygjD family as claimed herein; 

BLAST searches (J. Mol. Biol. (1990) 215:403 -10 and Meth. Enzymol. (1996) 
266: 131-141, 227-258 both incorporated herein by reference) may be carried out 
using the ygjD family member sequences as described in Figure 1. Such searches 
involve using in succession as query sequences, each of the existing ygjD protein 
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family member sequences to identify other full length members of the ygjD family 
of proteins. Such family members yield high-scoring segment pairs (HSP) scores of 
greater than 100 in comparison to at least one member of the ygjD family when the 
BLAST algorithm described in the reference above is used with a particular scoring 
5 matrix (a BLOSUM62 matrix - Proteins (1993) 17:49-61 incorporated herein by 
reference). 

Profile based searches (Proceedings of the second International Conference on 
Intelligent Systems for Molecular Biology, pp28-36, AAAI Press, Menlo Park 

1 0 California, 1994 incorporated herein by reference) may be carried out using 

position-dependent scoring matrices defined for the ygjD family members. These 
searches use a table compiled from a multiple sequence alignment which describes 
distinctive sequences of amino acids as probability values for each residue at each 
position in the gene family to identify other proteins which contain similar 

1 5 sequences of amino acids. 

Motif based searches (Nucleic Acids Res. (1995) 24:189-196 incorporated herein 
by reference) may be carried out using PROSITE patterns defined for the ygjD 
family members. These searches involve the representation as patterns, of the 
20 conserved sequence elements identified in the profile searches. 

The isolated polypeptides of the invention may therefore be characterised by : 

i) an HSP score of greater than or equal to 1 00 when compared with one of the 
25 sequences of Figure 1 when the BLAST algorithm is used with a LOSUM62 

scoring matrix; or 

ii) containing a set of amino acid sequences which are positively identified 
when position dependent scoring matrices according to Tables 1-4 are used 

30 with MAST to yield a p-value of less than 1 xl 0" 50 ; or 

iii) comprising at least one of the following amino acid sequences: 



35 



[LIV](2)-[SCT]-G-G-H-X(17,21)-D-D-[AST]-X-G-E-X(2)-D-K; 
A-X(3)-P-G-L-X(3)-l-X(2)-G-X(13)-P-X(5)-H-X(3)-H 
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4 

[VIL]-l-[GSAT]-[VILFM]-E-[TS]-[TS]-C-D-[DE];and 
G-[LIV]-V-P-E-[LIV]-A-[AST]-R-X-H; 

wherein 

5 the letters denote an amino acid in one letter code, 

the square brackets denote a single amino acid, 
the amino acids within the square brackets are alternatives, 
X is any one amino acid residue, and 

the numbers in the curved brackets refer to the number of residues at that 
10 position; 

or 

iv) [KR]-[GSAT]-X(4)-[FYWLH]4DQNGK]-X-P-X-[LIVMFY]-X(3)-H-X(2)-[AG]-H- 
15 [LIVM] 

wherein 

the letters denote an amino acid in one letter code, 
the square brackets denote a single amino acid, 
20 the amino acids within the square brackets are alternatives, 

X is any one amino acid residue, and 

the numbers in the curved brackets refer to the number of residues at that 
position. 

25 In a preferred aspect of the invention all three of the amino acid sequences listed 
under iii) are present. 

The invention also provides an isolated polypeptide sequence as set out in any of 
Figures 2a-d. 

30 

The polypeptides are preferably recombinant and ideally purified to homogeneity. 



35 



Also included as polypeptides according to the invention are variants, analogues and 
derivatives. Particularly those in which a number of amino acids have been 
substituted, deleted or added. Polypeptides which have at least 70% identity to any 
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of the polypeptide sequences according to the invention, in particular the sequences 
of Figures 2a-d are encompassed within the invention. Preferably the identity is at 
least 80%, more preferably at least 90% and still more preferably at least or greater 
than 95% identity for example 97%, 98% or even 99% identity to any of the 
sequences according to the invention, in particular the sequences of Figures 2a-d. 

Such polypeptides may also be fragments. In this regard a fragment is a part of a 
polypeptide according to the invention which retains sufficient identity of the 
original polypeptide to be effective for example in a screen. Such fragments may be 
fused to other amino acids or polypeptides or may be comprised within a larger 
polypeptide. Such a fragment may be comprised within a precursor polypeptide 
designed for expression in a host. Therefore in one aspect the term fragment means a 
portion or portions of a fusion polypeptide or polypeptide derived from a 
polypeptide according to the invention. 

Fragments also include portions of a polypeptide according to the invention 
characterised by structural or functional attributes of a polypeptide according to the 
invention. These may have similar or improved chemical or biological activity or 
reduced side-effect activity. For example fragments may comprise an alpha helix or 
alpha-helix forming region, beta sheet and beta-sheet forming region, turn and turn 
forming regions, coil and coil-forming regions, hydrophilic regions, hydrophobic 
regions, amphipathic regions (alpha or beta), flexible regions, surface-forming 
regions, substrate binding regions and regions of high antigenic index. 

Fragments or portions may be used for producing the corresponding full length 
polypeptide by peptide synthesis. 

Specific polypeptides according to the invention include the polypeptides of 
Borrella burgdorferi, Treponema pallidium, Synechocystis sp. Strain PCC6803, 
Helicobacter pylori, Arabidopsis thaliana, Haemophilus influenza, Mycobacterium 
tuberculosis, Mycobacterium leprae, Pasturella haemolytica, Mycoplasma 
genitalium, Mycoplasma pneumoniae, Streptococcus pneumoniae, Streptococcus 
pyogenes, Bacillus subtilis and Escherichia coli. 
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The present invention further provides isolated polynucleotides which encode the 
polypeptides as defined herein, polynucleotides complementary thereto, or 
polynucleotides hybridising to any of the aforesaid polynucleotides. Isolated 
polynucleotides have been removed by separation from their natural environment 
and those materials with which they are naturally associated. Preferably these 
polynucleotide molecules are provided in recombinant form (i.e. combined with one 
or more heterologous sequences). 

Polynucleotide molecules which hybridise to polynucleotides encoding substances of 
the present invention, or to complementary polynucleotides thereto, preferably do so 
under stringent hybridisation conditions. One example of stringent hybridisation 
conditions which is sometimes used is where attempted hybridisation is carried out at 
a temperature of from about 35°C to about 65°C using a salt solution which is about 
0.9 molar. However, the skilled person will be able to vary such conditions as 
appropriate in order to take into account variables such as probe length, base 
composition, type of ions present, etc. 

The invention also provides polynucleotide variants, analogues, derivatives and 
fragments which encode polypeptides according to the invention. Polynucleotides 
are included which preferably have at least 70% identity over their entire length to a 
polynucleotide encoding a polypeptide according to the invention, most preferably 
those set out in Figures 2a-d. More preferred are those sequences which have at least 
80% identity over their entire length to a polynucleotide encoding a polypeptide 
according to the invention. Even more preferred are polynucleotides which 
demonstrate at least 90% for example 95%, 97%, 98% or 99% identity over their 
entire length to a polynucleotide encoding a polypeptide according to the invention. 

Polynucleotide molecules of the present invention may be used as probes for other 
members of the gene family or in anti-sense therapy to block or to reduce the 
expression of one or more of the polypeptides of the invention. Since these substances 
are believed to be essential to the bacteria expressing them, blocking or reducing their 
expression can provide an effective way of treating bacterial mediated diseases or 
disorders. Polynucleotides may also be used directly in screening and in generating 
whole cell screens by expression of a polypeptide of the inventions. 
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As part of the isolation process or thereafter the polynucleotides may be joined to 
other polynucleotides such as to form fusions or to regulatory elements for 
expression. Isolated polynucleotides alone or joined to other polynucleotides can be 
in introduced into a vector which itself will contain other elements of DNA or RNA 
5 for expression in a host cells. The invention therefore comprises a vector containing 
a polynucleotide generally operatively linked to appropriate expression control 
sequences. 

Vectors for use in the invention include plasmid vectors, phage vectors and DNA or 
10 RNA viral vectors. These vectors may include gene sequences which render them 
inducible under certain conditions such as manipulation of the environmental 
conditions under which the host cells are maintained for example by temperature 
alteration or nutrient additives. Regulatory sequences include for example a 
promoter to direct mRNA transcription. Such promoters include for example E. coli. 
1 5 lac, trp, tac and araB AD as well as the SV40 early and late promoters Such systems 
and sequences would be well known to those skilled in the art. 

Host cells expressing a polynucleotide of the present invention can be generated by 
any of the traditional routes such as transfection or electroporation see for example 
20 Davis et al, Basic Methods in Molecular Biology, (1 986) and Sambrook et al 

Molecular Cloning: A Laboratory Manual, 2 nd Edition., Cold Spring Harbor Lab. 
Press, Cold Spring Harbor, N.Y. (1989). 

This invention also provides a method for identification of molecules such as 
25 antagonists, that bind to the polypeptide or a polynucleotide encoding a polypeptide 
of the present invention. 

Selective whole-cell screens combine the sensitivity and specificity of in vitro 
biochemical assays with the direct demonstration of in vivo activity seen in whole 
30 cell screens. Biochemical assays for inhibition of polypeptide activity with purified 
polypeptides or bacterial extracts can be more sensitive than whole cell killing 
assays and provide direct evidence for a compound's mode of action. However, this 
approach requires that the target polypeptide is known and the activity of the 
polypeptide be amenable to in vitro assays. Nor does it address other factors, such 
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as membrane permeability or compound stability, which can limit a compounds 
effectiveness as an antibiotic. 

Whole cell screening of compounds for killing activity will identify molecules 
5 which kill cells at the concentrations tested, but provide no information on the mode 
of action of the compound and may not have the sensitivity needed to detect less 
potent compounds. Bacterial strains which contain surrogate markers whose activity 
is linked to that of the target gene or which have been engineered to over-express or 
under-express the target polypeptide can be used for selective whole-cell screens. 

10 

Surrogate markers, easily assayed reporter molecules whose activity is tightly 
coupled to the activity of the polypeptide being studied, may be used as a means of 
assaying antibiotics. The invention further provides a host cell comprising a vector 
as defined herein and a reporter gene encoding a reporter molecule whose activity is 

1 5 linked to that of the polypeptide encoded by the vector. Examples of such systems 
include a transcriptional fusion of the E. coli lacZ gene to vanH promoter in a B. 
subtilis strain expressing VanS and R as a reporter for inhibition of cell wall 
biosynthesis (J. Bacteriol. (1996) 178:6305-6309), the use of lacZ transcriptional 
and translational fusions to rpoB and rpoC to monitor RNA polymerase activity 

20 (Mol. Microbiol. (1996) 19:483-493) and the use of a secA-lacZ gene fusion as a 
reporter for inhibition of secA activity (Genetics (1988) 1 18:571-579). 

When the function of a gene is unknown, surrogate markers for the activity of the 
gene can be identified using at least two approaches. Two dimensional 

25 electrophoresis coupled with mass spectrometry analysis of isolated polypeptides, 

proteome mapping, has been used to identify specific polypeptides which increase in 
abundance in response to polypeptide or RNA synthesis inhibitors (Microbial & 
Comparative Genomics (1996) 1 :375). Tightly regulated promoters used to 
demonstrate that the E. coli and B. subtilis conserved, essential polypeptides are 

30 essential can also be used to reduce the concentrations of these polypeptides. In a 
manner similar to that described above, proteome maps generated from bacteria 
depleted of the conserved essential genes can be used to detect polypeptides which 
change in abundance as compared to wild-type bacteria. Transcriptional or 
translational fusions to these polypeptides can be used as reporter molecules to 

35 screen for antagonists of members of the conserved essential gene family. As an 
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alternative to proteome mapping, transposons or other mobile genetic elements 
containing reporter genes can be used to search for reporter molecules. Such an 
approach has been used to identify vancomycin responsive genes in S. aureus 
(Antibiot. (Tokyo) (1991) 44:210-217). As with proteome mapping, bacteria in 
5 which conserved essential genes are controlled by tightly regulated promoters can 
be used to screen for transposon carrying strains in which expression of 
promoterless reporter genes is induced upon depletion of the polypeptides. 

Once a reporter gene has been identified, screening of compounds for induction or 
1 0 inhibition of the marker can be undertaken. Standard broth or plate assays can be 
used in many different formats. Such assays will detect molecules which antagonise 
the response which couples the activity of the conserved, target polypeptide to the 
reporter molecule. Thus, the compounds identified may act directly upon the target 
polypeptide or on another stage in the pathway which leads to activation of the 
1 5 reporter. 

Screens for inhibitors of the target which do not require the use of surrogate markers 
may be designed by manipulating expression levels of the target polypeptide. For 
example, quinolone resistant strains of E. coli have been made by over-expression of 

20 gyrA (FEMS Microbiol. Lett. (1997) 154:271-276), over-expression of alanine 

racemase has been shown to increase resistance to cycloserine in M. smegmatis (J. 
Bacteriol. (1997) 179:5046-5055), and multicopy plasmids carrying murZ have been 
shown to increase phosphomycin resistance in both E. coli (J. Bacteriol. (1992) 
174:5748-5752) and^. calcoaceticus (FEMS Microbiol. Lett. (1994) 117:137-142). 

25 Similarly, strains more sensitive to antibiotics may be made by reducing expression 
levels of the polypeptide targeted by the antibiotic. Over or under-expression of 
members of the conserved, essential gene family may be used to screen for 
antibiotics which act either directly on gene or gene product or indirectly on the 
pathway which it is involved. 

30 

Another example of an assay for antagonists is a competitive assay that combines 
the polypeptide of the present invention and a potential antagonist with membrane- 
bound binding molecules, recombinant binding molecules, natural substrates or 
ligands, or substrate or ligand mimetics, under appropriate conditions for a 
35 competitive inhibition assay. The polypeptide can be labelled, such as by 
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radioactivity or a colorimetric compound, such that the number of polypeptide 
molecules bound to a binding molecule or converted to product can be determined 
accurately to assess the effectiveness of the potential antagonist. 

The present invention therefore provides a method of assaying compounds for 
activity against bacteria comprising: 

i) providing a polypeptide according to the invention; 

ii) contacting said polypeptide with candidate inhibitory compounds; and 

iii) measuring for binding to said polypeptide or fragment. 

The present invention also provides a method of assaying compounds for activity 
against bacteria comprising: 

i) expressing a polypeptide according to the invention in a host cell; 

ii) contacting said cell with candidate inhibitory compounds; and 

iii) measuring cell death. 

The present invention further provides a method of screening for an antibiotic which 
method comprises: 

i) transfecting a host cell with a vector comprising a polynucleotide encoding a 
polypeptide as defined herein; 

ii) allowing the host cell to express the polynucleotide; 

iii) increasing the level of expression of the polypeptide as defined herein; and 

iv) assaying for increased resistance. 

Alternatively the method may be carried out as above but the level of expression of 
the polypeptide is decreased and the cells are assayed for increased sensitivity to an 
inhibitor. 

The present invention also provides a method of assaying compounds for activity 
against bacteria comprising: 
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i) generating a bacterial strain containing a reporter gene linked to the gene 
encoding a polypeptide according to the invention; 

ii) contacting said strain with candidate inhibitory compounds; and 

iii) measuring for induction or inhibition of said marker. 

Potential antagonists include small organic molecules, ions which interact 
specifically with a polypeptide or polynucleotide for example a substrate, cell 
membrane component, receptor a fragment thereof or a peptide. Such molecules may 
include antibodies, antibody-derived reagents or chimaeric molecules. 

Potential antagonists also may be small organic molecules, a peptide, a polypeptide 
such as a closely related protein or antibody that binds to the same sites on a binding 
molecule without inducing functional activity of the polypeptide of the invention. 

The antibodies may be monoclonal or polyclonal. Techniques for producing 
monoclonal and polyclonal antibodies which bind to a particular polypeptide are now 
well developed in the art. They are discussed in standard immunology textbooks, for 
example in Roitt et al {Immunology, Churchill Livingston, 2nd Edition (1989)). 

In addition to whole antibodies, the present invention covers variants thereof which 
are capable of binding to an epitope present or a substance of the present invention. 
The variants may be antibody fragments or synthetic constructs. Examples of 
antibody fragments and synthetic constructs are given by Dougall et al in Tibtech 12 
372-379 (September 1994). Antibody fragments include Fab and Fv fragments. 

Other synthetic constructs include CDR peptides. These are synthetic peptides 
comprising antigen binding determinants. Peptide mimetics may also be used. These 
molecules are usually conformationally restricted organic rings which mimic the 
structure of a CDR loop and which include antigen-interactive side chains. Synthetic 
constructs include chimaeric molecules. Thus, for example, humanised antibodies or 
derivatives thereof are within the scope of the present invention. An example of a 
humanised antibody is an antibody having human framework regions, but a rodent or 
other non-human hypervariable regions. Synthetic constructs also include molecules 
comprising a covalently linked moiety which provides the molecule with some 
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desirable property in addition to antigen binding. For example the moiety may be a 
label (e.g. a fluorescent or radioactive label) or a pharmaceutically active agent. 

Other potential antagonists include antisense molecules (see Okano, J. Neurochem. 
56:560 (1991); Oligodeoxynucleotides As Antisense Inhibitors Of Gene Expression, 
CRC Press, Boca Raton, FL (1988), for a description of these molecules). 

In a particular aspect the invention provides the use of the polypeptide, 
polynucleotide or antagonist of the invention to interfere with the initial physical 
interaction between a pathogen and mammalian host responsible for sequelae of 
infection. 

The invention further includes molecules which block the function of the 
polypeptides according to the invention or a polynucleotide encoding the same, 
identifiable by any of the above described methods. 

An antagonist of the invention may be provided in pharmaceutical compositions 
which may include a carrier. They may be provided in unit dosage form. Such agents 
and pharmaceutical compositions are within the scope of the present invention. In 
order to prepare such pharmaceutical compositions the inhibitors will normally be 
provided in substantially pure form. They can then be combined with a carrier under 
sterile conditions. 

The present invention also provides a method of treatment which comprises 
administering to a patient an effective amount of an antagonist of the expression or 
function of a polypeptide as defined herein. 

The present invention further provides the use of an antagonist of a polypeptide as 
defined herein or a polynucleotide encoding the same for the manufacture of a 
medicament for the treatment of a bacterial infection. 

Figures 

Figure 1 shows the multiple sequence alignment and BLAST based identification of 
the ygjD family members. 
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Figures 2a-d show position-dependant scoring matrices for profile-based 
identification of ygjD family members. 

5 Figure 3 shows the PROSITE patterns of ygjD family members based on the motifs 
generated from the position dependent scoring matrices. 

Figure 4 shows the outline cloning strategy for a gene disruption plasmid. The 
black box represents the adapter sequence. 

10 

Figure 5 shows Growth dependence on arabinose of a conditional mutant in the E 
coli gene ygjD. An E. coli MG1655 derivative in which the chromosomal areBAD 
genes have been replaced with ygjD and the native ygjD gene has been deleted is 
shown on the upper half of each plate and a wild-type control is shown on the lower 
1 5 half of each plate. 

Figure 6 is a diagram of the vector used to create conditional mutants in B. subtilis. 

Figure 7 shows growth dependence on xylose of a conditional mutant in the B. 
20 subtilis ygjD orthologue yidE. 

Figure 8 shows over-expression of the ygjD protein. 

SDS-PAGE of E. coli MG1655/pASK-ygjD (Lanes 1, 3, 5) and MG1655/pASK75 
(Lanes 2 and 4) whole-cell extracts. M-molecular weight standard. Lane 1 : 
25 uninduced. Lanes 2 and 3: 1 hour induction. Lanes 4 and 5: 3 hours induction. 

Examples 

Example 1. Identification of conserved bacterial open reading frames. 

30 

The predicted open reading frames obtained from the complete £. coli genomic 
sequence (Science (1997) 277: 1453-1474) were compared in a serial manner to the 
predicted open reading frames of the K influenzae (Science (1995) 270:397-403), 
M genatilum (Science (1995) 270:397-403), Synechocystis (Nuc. Acids Res. (1998) 
35 26: 63-67) and B. subtilis (Nature (1997) 390:249-256) complete genome sequences 
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using the BLAST algorithm (J. Mol. Biol. (1990) 215:403-10). All matches with 
BLAST Score of greater than 75 were then analysed in a pair-wise fashion using the 
SIM algorithm (Advances in Applied Mathematics (1991) 12:337-357). The SIM 
score was then divided by a "selfSIM" score, a value obtained when the query 
protein is compared to itself using SIM algorithm with the PAM200 matix, to yield 
a similarity value of between 1.0 and 0. Proteins for which this similarity value was 
greater than 0.2 when the E. coli protein was compared to either the B. subtilis or M 
genatilum genome where then compiled into a list and manually screened to identify 
proteins of unknown function. Those open reading frames which also had high 
similarity values in other bacteria were then considered as candidate genes and 
targets for gene disruption. 

Example 2. Demonstration of essentiality of ygjD genes in E. coll 

2 A - In-frame deletion of selected genes in E. coli. 

A disruption plasmid was constructed using DNA containing an in-frame deletion of 
the gene of interest plus -900 base pairs of 5' and 3' flanking DNA for homologous 
recombination. The plasmid was cloned into the gene-replacement vector pK03 as 
follows: Two separate PCR reactions were used to amplify fragments of 
approximately 900 base pairs of 5 5 and 3' sequence flanking the gene of interest. 
Chromosomal DNA from E .coli strain MG1655 was used as the template. Primers 
2 and 3 carry a 5' extension of a 33 bp adapter sequence 

adaptor sequence forward direction S'-gttataaatttggagtgtgaaggttattgcgtg; 
adaptor sequence reverse direction S'-cacgcaataaccttcacactccaaatttataac. 

Subsequently, the 2 PCR products were purified using High Pure™PCR Product 
Purification Kit (Boehringer Mannheim Inc., Mannheim, GE). Using the adapter 
sequence, the 2 PCR products are assembled in a second PCR reaction to give a 
single product . Following restriction enzyme digestion, preparative agarose gel 
electrophoresis and purification using Jetsorb™Gel Extraction Kit (Genomed Inc.) 
the final product was cloned into pK03 using standard techniques. This clone is 
referred to as the disruption plasmid. All PCR reactions described in this section 
were performed with PWO™ DNA Polymerase (Boehringer Mannheim Inc., 
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Mannheim, GE). In the final product the gene of interest was deleted from the start 
to the stop codon and replaced by the 33 bp adapter sequence [e.g. 5'- 
ATGgttataaatttggagtgtgaaggttattgcgtgTAA-3']. As a consequence the reading frame 
is maintained. 

2B - Construction of an in-frame deletion mutant of Escherichia coli 

The disruption vector pK03 (A.J.Link et al., J. Bacteriol. 179:6228-6237,1997) is a 
derivative of pMAK700 (C.A.Hamilton et ah, J. Bacteriol. 171:4617-4622). It 
features the rep A (Ts) replication origin derived from pSClOl [permissive at 30°C 
but inactive at 42 to 44°C], the cat gene encoding chloramphenicol resistance and 
the sacB gene for counter selection against vector sequences in the presence of 5% 
sucrose. 

The disruption plasmid described above was transformed into MG1655. 
Subsequently, chromosomal integrates (cointegrates produced by a single 
homologous recombination event) of the plasmid were isolated by selecting clones 
on chloramphenicol at 44°C. Following 2-times purification under the same 
conditions, the cointegrates are grown at 30°C in the presence of 5% sucrose to 
force resolution of the cointegrate and elimination of the plasmid from the cell. At 
this step, a preliminary assignment if a given gene is essential or non-essential for 
growth of E. coli in complex media was made. The genotype of the 
chloramphenicol-sensitive clones obtained following cointegration and resolution of 
the disruption plasmid was determined by colony-PCR using primers cl and c2 (see 
Fig.4). In the case of a non-essential gene, the second recombination event can result 
in either a wild-type or a mutant genotype. The testing of 20 independent clones, 
showed routinely that a -1 : 1 distribution of wild-type versus mutant genotype in 
case of a non-essential gene. Recovery of only wild-type genotype in 50 
independent clones was considered as preliminary evidence for a gene's essentiality. 

2C - Construction of a conditional mutant and final proof that a given gene is 
essential for growth of E. coli 

A vector, pRDC15 was designed, which allows a copy of a putative essential gene to 
be placed in ectopic position on the chromosome under the control of a tightly 
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regulated promoter. The plasmid is a derivative of pK03. In addition to the 
attributes of pK03, pRDC15 carries a DNA fragment consisting of the araC gene, 
the arabinose promoter, a cloning site [BamRl-Nhel-Sfil-Xhol-Sphl-Sfil] and the 
polB gene. The wild-type copy of a putative essential gene was amplified by PCR 
5 and cloned into the vector pRDC 1 5 using restriction sites Nhel and Xhol. The 
resulting construct was used for gene replacement in a manner identical to the 
disruption plasmids described above. In this case the araC axi&polB genes of 
pRDC15 represent the homologous DNA for recombination at the araCBADpolB 
locus of the E. coli chromosome. Following cointegration and resolution, the 
1 0 araBAD genes in the E. coli chromosome are replaced by the wild-type copy of the 
gene of interest, which is now under the control of the arabinose promoter. This 
merodiploid strain is then used to construct an in frame deletion of the wild-type 
target gene using the disruption plasmid described above in the presence of 0.2% 
arabinose. In this case, the deletion mutant can be obtained since a wild-type copy is 
1 5 expressed in trans from the arabinose locus. The resulting strain is a conditional 
mutant as expression of the target gene is now dependent on the presence of 
arabinose. The inability of such a strain to grow in the absence of arabinose is a final 
proof that a given gene is essential for growth of E. coli. Figure 5 shows that the 
gene ygjD is essential in E. coli. 

20 

Example 3 yidE is an essential gene in Bacillus subtilis. 

3 A - Construction of a B. subtilis integrative plasmid for xylose controlled gene 
expression. 

25 

An integrative plasmid allowing the expression of genes under the control of a 
xylose inducible promoter was constructed as follows: A DNA fragment carrying 
the repressor gene xylR and the xylA promoter was PCR amplified from B. subtilis 
genomic DNA with the following primers: 

30 

pxyl-4: 5'-atcgctcgagAGATGCACCTTCTATACCCG-3' 
pxyl-7: 5 ' -atcgaagctt AGCG ATCCT AC AC AATC ATG-3 ' 

The primers were designed such that they introduced a unique EcoW site at the 5' 
3 5 end of the PCR product and a unique BamUl site at the 3 ' end of the product. The 
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PCR fragment was then cloned as an EcoRI-BamHI fragment into the B. subtilis 
integrative vector pDG648 to yield pRDC9 (Figure 6). 

3B - Construction of the disruption plasmid. 

5 

A DNA fragment containing approximately 100 bp sequence from the 5' region of 
yidE was amplified by PCR from B. subtilis genomic DNA. The PCR primers were 
designed such that the resulting PCR product contains unique restrictions site at 
both the 5' and 3 'ends of the PCR product. Subsequently, the PCR product was 
10 cloned into pRDC9. 

3C - Construction of a conditional mutant. 

The disruption plasmid was inserted into B. subtilis strain JH642. Chromosomal 
1 5 integration of the plasmid via single-reciprocal Campbell-like recombination at the 
yidE locus into the chromosome was driven by selection on LB plates containing 
erythromycin (1 |ag/ml), lincomycin (25 |xg/ml) and 10 mM xylose. The resulting 
strain is a conditional mutant in which expression of yidE is dependent on the 
presence of xylose into the growth medium. 

20 

3D - Confirmation that yidE is an essential gene. 

Confirmation of that yidE is essential for growth was obtained by streaking the yidE 
conditional mutant LB plates plates containing erythromycin (1 |ig/ml), lincomycin 
25 (25 |ig/ml) with or without 10 mM xylose. The strain formed single colonies only on 
xylose containing plates thereby indicating that expression of yidE is indispensable 
for growth (Figure 7). 

Example 4 - Characterisation of the ygjD polypeptide family 

30 

4A - Repetitive BLAST searches 

Repetitive BLAST searches (Altschul, S.F., Gish, W., Miller, W., Myers E.W., and. 
Lipman, D.J. (1990). Basic local alignment search tool. J. Mol. Biol. 215:403-10) 
35 in which each of the of the ygjD protein family members described below were used 
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in succession as query sequences to identify other members of the ygjD family as 
proteins which yield high-scoring segment pairs (HSP) scores of greater than 100 in 
comparison to at least one member of the ygjD polypeptide sequnces shown in 
figure 1 when a BLOSUM62 scoring matrix is used. 

5 

Sources for each of the sequences set out in Figure 1 are given below: 

K influenzae - GCP, Swissprot accession number P43764 
P. haemolytica - GCP, Swissprot accession number P36175 
10 E. coli - ygjD, Swissprot accession number P05852 

M leprae - Y246, Swissprot accession number P37969 
M tuberculosis - Y09A, Swissprot accession number Q50709 
S. epidermidis - Glaxo Wellcome S. epidermidis genomic sequencing project ORF 
Z0254002 

1 5 B. subtilis - yidE, Swissprot/trEMBL accession number O05 5 1 8 

S. pyogenes - Contig229 from S. pyogenes genome sequencing project, B.A. Roe, S. 
Clifton, Mike McShan and Joseph Ferretti 
(http://www.genome.ou.edu/strep.html), August 25, 1997 data 

release 

20 S. pneumoniae - Glaxo Wellcome S. pneumoniae genomic sequencing project contig 

SP09_0003 

Synechocystis - Y807, Swissprot accession number P74034 
B. burgdorferi - EMBL accession number G2688702 

T. pallidium - contig 6278 from the T. pallidium genome sequencing project at 
25 http://www.ncbi.nlm.nih.gov/BLAST/tigr_db.html 
M. genitalium - GCP, Swissprot accession number P47292 
M. pneumoniae - GCP, Swissprot accession number P75055 
A, thaliana - F4L23.22, Swissprot/trEMBL accession number 022145 
H. pylori - GCP, Swissprot accession number P55996 

30 

4B - Profile based searches 

Multiple sequence alignments of the ygjD family members have been used to 
identify short patterns of amino acid sequences, which are common to all of the 
35 family members. Four motifs have been identified in the ygjD gene family using the 
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motif discovery tool, MEME (Bailey, T. L. and Elkan, C, Fitting a mixture model 
by expectation maximization to discover motifs in biopolymers, Proceedings of the 
Second International Conference on Intelligent Systems for Molecular Biology, pp. 
28-36, AAAI Press, Menlo Park, California, 1994). Each of the four motifs are 

5 shown as they exist in each of the family members and are explicitly described as 
position-dependent scoring matrices, or profiles. Together these profiles can be 
used by the motif alignment and search tool, MAST, described in the same 
reference, to search databases for ygjD family members, which are positively 
identified when p-values of less than 1 x 10" 50 are obtained. Where p- values are 

1 0 based on a random sequence model that assumes each position in a random 

sequence is generated according to the average letter frequencies of all sequences in 
the peptide non-redundant database (ftp://ncbi.nlm.nih.gov/blast/db/) on September 
22, 1996. 

1 5 Tables 1 to 4 show the position dependent scoring used to define the ygjD family. 
Values in the position-dependent scoring matrix are calculated by taking the log 
(base 2) of the ratio p/f at each position in the motif where p is the probability of a 
particular letter at that position in the motif, and f is the average frequency of that 
letter in the training set. Columns correspond to 1 letter amino acid codes and rows 

20 correspond to the position in the motif. 
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4C - PROSITE based searches 

The conserved sequence elements identified with MEME can also be represented as 
PROSITE patterns using the conventions outlined in PROSITE: A dictionary of protein 
sites and patterns (http://www.expasy.ch/sprot/prosite.html) and Bairoch A., Bucher P., 
HofmannK. The PROSITE datatase, its status in 1995. Nucleic Acids Res. 24:189- 
196(1995). YgjD family members are positively identified when exact matches to any 
one of the four prosite patterns pattern 1, pattern 2, pattern 3 or pattern 4 as set out in 
Figure 3 are found in the protein sequence. Alternatively, ygjD family members can be 
identified using PROSSITE pattern PS01016 found in the PROSITE database. 

Example 5 - Over-expression of the E. coli ygjD polypeptide 

The E. coli ygjD gene was amplified from E. coli chromosomal DNA in the 
presence of 1 jiM each of the primers ask-eygjD5 [5'-gatctctagataaagcgaggtaaaacaagtc- 
3'] and ask-eygjD3 [5'-gatcctcgagtTTAcgcagccggtaactc-3'] and a nucleotide 
concentration of 250 jiM using Pwo DNA Polymerase (Boehringer, Mannheim. 
Germany). 25 cycles of 30 sec at 94 °C/30 sec at 58 °C/1 min at 72 °C with a final 5 
min extension at 72 °C were performed. The purified PCR product was cleaved with 
Xbal and Xhol and cloned into the expression vector pASK75 (Gene (1994) 151:131- 
135) cut with the same restriction endonucleases. The cloned ygjD gene was sequenced. 
The resulting plasmid pASK-ygjD was transformed into E. coli MG1655. Each 50 ml of 
LB medium containing 100 jig/ml carbenicillin was inoculated with 0.5 ml of a 
MG1655/pASK-ygjD or MG1655/pASK75 over-night culture and incubated at 30 °C. 
At an optical density of 0.65 at 600 ran, the cultures were induced with 200 ng/ml 
anhydrotetracycline. At the time of induction and after 1 and 3 hours post induction 
samples of 1 ml were withdrawn, the cells harvested by centrifugation and resuspended 
in Ix SDS-PAGE sample buffer (140 (xl per 1 OD 6 oo equivalent). The samples were 
boiled for 5 minutes and analyzed on a 4-20% SDS-PAGE gradient gel stained with 
Coomassie Brilliant Blue. Induction of a 36 kDa protein representing YgjD can be seen 
1 and 3 hours following induction (Figure 8). 
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CLAIMS 

1 . An isolated polypeptide of the ygjD family as defined by: 

i) an HSP score of greater than or equal to 1 00 when compared with one of the 
sequences of Figure 1 when the BLAST algorithm is used with a BLOSUM62 
scoring matrix ; or 

ii) containing a set of amino acid sequences which are positively identified when 
position dependent scoring matrices according to Tables 1-4 are used with 
MAST to yield a p-value of less than 1x10°°; or 

iii) comprising at least one of the following amino acid sequences: 

[LIV](2)-[SCT]-G-G-H-X(17 5 21)-D-D-[AST]-X-G-E-X(2)-D-K; 
A-X(3>P-G-L-X(3)-L-X(2)-G-X(13)-P-X(5)-H-X(3)-H; 
[VIL]-L-[GSAT]-[VILFM]-E-[TS]-[TS]-C-D-[DE];and 
G-[LTV]-V-P-E-[LIV]-A-[AST]-R-X-H; 

where. 

the letters denote an amino acid in one letter code, 

the square brackets denote a single amino acid, 

the ammo acids within the square brackets are alternatives, 

X is any one amino acid residue, and 

the numbers in the curved brackets refer to the number of residues at that 

position; 
or 

iv) [KR]-[GSAT]-X(4)-[FYWLH]-[DQNGK]-X-P-X-[LIVMFY]-X(3)-H-X(2)-[AG]-H-[L[VM] 
where. 
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the letters denote an amino acid in one letter code, 

the square brackets denote a single amino acid, 

the amino acids within the square brackets are alternatives, 

X is any one amino acid residue, and 

the numbers in the curved brackets refer to the number of residues at that 

position. 

2. A polvpeptide or fragment according to claim 1 comprising all three of the sequences 
listed in iii). 

3. A polypeptide containing any of the sequences set out in Figures 2a-2d. 

4. A polypeptide according to any of claims 1-3 wherein said polypeptide is from 
Borrella burgdorferi, Treponema pallidium, Synechocystis sp. Strain PCC6803, 
Helicobacter pylori, Arabidopsis thaliana, Haemophilus influenza, Mycobacterium 
tuberculosis, Mycobacterium leprae, Pasturella haemolytica, Mycoplasma genitalium, 
Mycoplasma pneumoniae, Streptococcus pneumoniae. Streptococcus pyogenes, 
Bacillus subtilis or Escherichia coli. 

5. A polypeptide according to any of claims 1-4 for use in a method of screening for 
agents with antibiotic activity. 

6. An isolated polynucleotide encoding a polypeptide as defined in any of claims 1-4. 

7. A vector comprising a transcriptional regulatory sequence and a nucleotide sequence 
encoding a polypeptide as defined in any of claims 1-4. 

8. A host cell comprising a vector as claimed in claim 7 and a reporter gene whose 
activity is linked to the expression of the polypeptide according to any of claims 1-4. 

9. A method of assaying compounds for activity against bacteria comprising: 
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i) providing a polypeptide according to the invention; 

ii) contacting said polypeptide with an antagonist; and 

iii) measuring for binding to said polypeptide. 

10. A method of assaying compounds for activity against bacteria comprising: 

i) expressing a polypeptide or fragment thereof according to any of claims 1 -4 in a 
host cell; 

ii) contacting said polypeptide with an antagonist; and 

iii) measuring for inactivation of said polypeptide. 

1 1 . A method of assaying compounds for activity against bacteria comprising: 

i) providing a polypeptide according to the invention; 

ii) contacting said polypeptide with an antagonist; and 

iii) measuring for cell death. 

12. A method of assaying compounds for activity against bacteria comprising: 

i) transfecting a host cell with a vector comprising a polynucleotide encoding 
a polypeptide as defined herein; 

ii) allowing the host cell to express the polynucleotide; 

iii) increasing the level of expression of the polypeptide as defined herein; 
measuring for binding to said polypeptide; and 

iv) assaying for increased resistance. 

13. A method of assaying compounds for activity against bacteria comprising: 

i) transfecting a host cell with a vector comprising a polynucleotide encoding 
a polypeptide as defined herein; 

ii) allowing the host cell to express the polynucleotide; 
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decreasing the level of expression of the polypeptide as defined herein; 
measuring for binding to said polypeptide; and 
assaying for increased sensitivity to an inhibitor. 

5 14. A method of assaying compounds for activity against bacteria comprising: 

i) generating a bacterial strain containing a reporter gene linked to the gene 
encoding a polypeptide according to the invention; 

ii) contacting said strain with an antagonist; and 

1 o iii) measuring for induction or inhibition of said marker. 

15. An antagonist of a polypeptide as defined in any of claims 1-4 identifiable by a 
method according to any of claims 9-14 for use in therapy. 

15 1 6. Use of an antagonist of a polypeptide as defined in any of claims 1 -4 identifiable Hv 
a method according to any of claims 9-14 for the manufacture of a medicament for 
the treatment of a bacterial infection. 

17. A method of treatment which comprises administering to a patient an effective 
20 amount of an antagonist of a polypeptide as defined in any of claims 1 -4 identifiable by 
a any of the methods according to claims 9-14. 
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